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Abstract 

One of the key challenges in sensor networks is the extraction of information by fusing data from 
a multitude of distinct, but possibly unreliable sensors. Recovering information from the maximum 
number of dependable sensors while specifying the unreliable ones is critical for robust sensing. This 
sensing task is formulated here as that of finding the maximum number of feasible subsystems of linear 
equations, and proved to be NP-hard. Useful links are established with compressive sampling, which 
aims at recovering vectors that are sparse. In contrast, the signals here are not sparse, but give rise 
to sparse residuals. Capitalizing on this form of sparsity, four sensing schemes with complementary 
strengths are developed. The first scheme is a convex relaxation of the original problem expressed as a 
second-order cone program (SOCP). It is shown that when the involved sensing matrices are Gaussian 
and the reliable measurements are sufficiently many, the SOCP can recover the optimal solution with 
overwhelming probability. The second scheme is obtained by replacing the initial objective function with 
a concave one. The third and fourth schemes are tailored for noisy sensor data. The noisy case is cast 
as a combinatorial problem that is subsequently surrogated by a (weighted) SOCP. Interestingly, the 
derived cost functions fall into the framework of robust multivariate linear regression, while an efficient 
block-coordinate descent algorithm is developed for their minimization. The robust sensing capabilities 
of all schemes are verified by simulated tests. 

Index Terms 

Sensor networks, robust methods, multivariate regression, convex relaxation, compressive sampling, 
coordinate descent. 
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I. Introduction 

Recent advances in sensor technology have made it feasible to deploy a network of inexpensive sensors 
for carrying out synergistically even sophisticated inference tasks. In applications such as environmental 
monitoring, surveillance of critical infrastructure, agriculture, or medical imaging, the typical concept 
of operation involves a large and possibly heterogeneous set of sensors locally observing the signal 
of interest, and transmitting their measurements to a higher-layer agent (fusion center). This so-termed 
layered sensing apparatus entails three operational conditions: 

(cl) Each node's measurement vector comprising either a collection of scalar observations across time, 
or a snapshot of different sensor readings, is typically assumed to be linearly related to the unknown 
variable(s). Such a linear model can arise when the sensing system is viewed as a linear filter with known 
impulse response. Even when the underlying model is non-linear, the observations are approximately 
modeled as adhering to a (multivariate) linear regression; 

(c2) Either because readings are costly to sense and transmit, due to delay or stationarity constraints, or 
simply because dimensionality reduction is invoked to cope with the "curse of dimensionality," the linear 
model is oftentimes under-determined, i.e., the dimension of the unknown vector is larger than that of 
each sensor's vector observation; and 

(c3) Not all sensors are reliable because failures in the sensing devices, fades of the sensor-agent 
communication link, physical obstruction of the scene of interest, and (un)intentional interference, all 
can severely deteriorate the consistency and reliability of sensor data. 

Conditions (cl)-(c3) suggest that the fusion center should not simply aggregate all sensor measurements, 
but instead identify and discard unreliable sensors before estimating the unknown vector based on reliable 
sensor data. This task is henceforth referred to as robust sensing (RS), and provides context of the present 
paper. Discerning the unreliable sensors not only promises higher estimation accuracy, but also enables 
corrective actions to re-establish a sensor's reliability, by e.g., remotely directing the sensor to the area 
of interest, or, increasing its sensitivity. Even though the related problem of outlier detection in sensor 
networks has been studied extensively (see e.g., 11331 for a recent survey), the RS setup and the approaches 
described here have not been considered before. 

The first contribution of this work is to formulate the RS task as an optimization problem based on 
the sensor data, and show it to be NP-hard (Section ITT]) . The second one consists of two (sub)-optimum 
RS solvers (Section [TTTb. The first solver is expressed as a second-order cone program (SOCP) through a 
convex relaxation of the original NP-hard problem. The idea of convex relaxation has been employed in 
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the emerging area of compressive sampling (CS) O, |[28l . ll8l . CS asserts that a sparse vector (i.e., one 
having many zero entries) can be recovered with overwhelming probability as the vector with minimum 
^4-norm satisfying an under-determined system of linear equations; a setup known as basis pursuit (BP) 
j9ll , l8l . |29l . CS has been generalized to block-sparse signals, where the unknown vector comprises 
predetermined subsets of variables that are (non) zero as a group |[27l , |[26l , ifTTTl . 1511 . Block sparsity 
emerges also in the RS formulation herein, not in the unknown vector though, but in the per-sensor 
residual error vectors. The relation between recovering block-sparse signals and the developed RS solver 
nicely generalizes the equivalence of BP with ^i-error regression from the scalar to the vector case. 
As an alternative to convex relaxation, the £cr(P seu do)norm of the wanted vector can be replaced by a 
concave approximation to further promote sparsity |[T2l . Q. This constitutes the second RS solver, which 
surrogates the original objective by a concave function, and minimizes it through a sequence of weighted 
SOCPs. 

The third contribution consists in analyzing the performance (identifiability) of the convex relaxation 
approach to recover the unknown vector, and successfully select the reliable sensors in the noise-free case 
(Section ITVl) . The analysis hinges on a set of necessary and sufficient conditions on the involved matrix 
range space, which appear also in the context of |[27l . Here a lower bound expressed in closed form is 
established on the probability of success when the design matrix is drawn from the Gaussian ensemble; 
see also |[24l . It is shown that whenever there is sufficient majority of reliable sensors and quantifiably 
enough per-sensor measurements, the solution of the SOCP is exact with overwhelming probability. 

In real-world applications, sensor readings are contaminated by additive noise due to quantization, 
communication noise, and/or unmodeled dynamics. Besides identifiability, the aforementioned schemes 
are thus appropriate only for the high signal-to-noise ratio (SNR) regime. When the sparse vector in CS 
is observed in noise, its recovery is based on methods such as the Lasso ||28l , or the group Lasso for 
vectors that are block-sparse |[32l . Different from CS, the approach here views the unreliable sensors as 
outliers, thus placing the sensing in the presence of noise (RSN) task under a robust multivariate linear 
regression framework O, The fourth contribution of this work (Section [V} is initially formulating 
RSN as a combinatorial optimization problem that is subsequently surrogated by a convex approximation. 
Interestingly, the novel cost function turns out to be a block version of Huber's function IfTTll . The resultant 
optimization problem is transformed to a group Lasso-type SOCP, and a computationally attractive block- 
coordinate descent algorithm is developed. An alternative RSN solver is also offered after replacing the 
previously derived convex problem with a non-convex one. The simulated tests presented in Section [VTI 
corroborate the proposed schemes, and the paper is concluded in Section IVIII 
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Notation: Lowercase (upper-case) boldface letters are reserved for column vectors (matrices), and 
calligraphic letters for sets; (-) T denotes transposition; Af(m, X) stands for the multivariate Gaussian 
probability density with mean m and covariance matrix S, while E[-] denotes the expectation operator. 
The notation ||x|| p := (X^iLi NiP) 1 ^ f° r P = 1(2) stands for the £i(^2)-norm in M n , and ||x||o the 
^o-(pseudo)norm which equals the number of nonzero entries of x. 

II. Preliminaries and Problem Statement 

Consider an agent, e.g., an unmanned aerial vehicle, collecting data vectors {bj}^ =1 of size m; x 1, 
and corresponding m; x n regression matrices {Aj}^ =1 from k sensors. The goal is to find an unknown 
vector x G M n , possibly satisfying the linear subsystems of equations bj = A,x for some i G {1, . . . , k}. 
This goal is challenging since the unknown vector x satisfies only an unknown subset of sensors. The 
RS problem can be compactly stated as follows. 

Problem Statement 1 (Robust sensing (RS)). Given k vector-matrix pairs, {bj, Aj}^ =1 , where bj G M. mi 
and Aj G R miXTl j find a vector x G M. n that maximizes the number of feasible linear subsystems 
{hi = Aix}. 

Vector x could model a scene (lexicographically ordered image) of interest viewed by multiple and 
possibly heterogeneous, e.g., Infrared, SAR, or, Lidar imaging systems. Matrices Aj may capture variable 
fields of view, different perspectives and resolutions in some (e.g., wavelet) domain, or, calibration 
parameters of the respective sensors. Alternatively, in an environmental monitoring application, x could 
represent the unknown parameters of a chemical/biological compound diffusion field described by the 
Green's function captured by the matrices {Aj}^ =1 , and measured by a wireless sensor network deployed 
over a region of interest. In such sensing applications, a sensor may reckoned unreliable or irrelevant 
due to obstruction, fading propagation effects, device failures, jamming, or, even because it collects data 
corresponding to an irrelevant x' ^ x; see Fig. Q] 

The RS task is different for over- and under-determined linear subsystems. Assume that all Aj's are 
full rank, i.e., rank(Aj) = min{mj,n} for all i\ Then, suppose that the i-th linear subsystem is over- 
determined (rrtj > n). This subsystem is either infeasible and can be ignored, or, it admits a unique 
solution Xj. In the latter case, it can be easily checked whether iq satisfies any other subsystem. The 

'This is without loss of generality (w.l.o.g.), because every sensor with rank(Ai) < min{mi,n} will be either infeasible, 
or, it can be transformed to an under-determined subsystem with full row rank. 
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solution jq together with the total number of subsystems it satisfies are retained, and the method proceeds 
similarly with all other over-determined subsystems. However, checking the under-determined subsystems 
(mj < n) is more challenging, since each one of them admits infinitely many solutions. Recognizing 
that over-determined subsystems can be easily handled, this paper focuses on the RS task when rrii < n 
for all i. Note that under-determinacy may arise naturally because of stringent power, bandwidth, delay, 
or stationarity constraints. Given that the bj's (Aj's) can be padded with zero entries (rows) to match 
the dimension maxj m;, it will be henceforth assumed w.l.o.g. m-i = m < n for all i. 

Before proceeding, it is useful to introduce some parameters. The set of all subsystem indices is denoted 
by X := {1, . . . , k}, whereas the pair (S, S) denotes a partition of I into the subset S and its complement 
S (S U S = 1, S n S = 0). Consider now the \S\m x n matrix As constructed by concatenating the 
matrices {A,}j gl 5, and likewise for the vector bs. The aggregate regression matrix and data vector are 
defined as A T := [Af . . . AjT] and b T := [hf . . . b^], respectively. 

Upon introducing an auxiliary vector t 6 the RS problem can be rigorously posed as 

(Po) 




If the i-th subsystem is deemed feasible, then U = 0; otherwise, tj is strictly positive and the cost ||t| 



increases. In a nutshell, \Pq\ minimizes the number of infeasible linear subsystems, and hence solves 
RS. Note also that the constraints are satisfied as equalities at the optimum. Thus, if the optimum x is 
given, the optimum t is readily available. This implies that the solution pair (x,t) is identified solely by 
x, which will be henceforth called the solution of \Pp). 



Even though the constraints in \Pq) are convex, the problem is non-convex. A greedy approach to 



solving it would be to assume there are s feasible subsystems, and let s range from k down to 1. For 
each value of s, one can check feasibility of the linear systems b^ = A^x for each of the (^) subsets 
S C 1 having cardinality \S\ = s, until a feasible subset is found. But this approach incurs combinatorial 
complexity, and can be computationally feasible only for small-size problems. In fact, it is not difficult 
to establish the following result. 

Proposition 1. The RS problem is NP-hard. 

Proof: Consider first the following problem of maximizing the number of consistent linear equations 
(MCLE): "Given a system of linear equations Cx = d, where C E M fcxn and d £ M. k , find a vector 
x G R n satisfying as many equations as possible." The MCLE problem is known to be NP-hard (3j 
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Th. 1]. Consider an instance of the MCLE problem. Choose an integer m > 2 and define the instance 
of RS with parameters (b, A) selected as £>(i_i) m +i = (k and A^_i) m+ ij = dj for i = 1, . . . , k and 
j = 1, . . . , n; and for their remaining entries. Solving an MCLE problem is hence equivalent to solving 
an instance of an RS. This simple reduction of MCLE to RS establishes the proposition. ■ 



In search of sub-optimum yet computationally affordable solvers of < \Pp} , one could adopt the least- 
squares (LS) approach, which amounts to 

min ||b — Ax|||. (1) 

X 

Alternatively, one could consider minimizing the i^-norm of the error, namely 

min ||b — Ax||i. (2) 

X 

Unfortunately, both approaches handle separately every linear equation, and thus ignore the underlying 
per-sensor linear subsystem. In addition, they cannot reliably identify the unreliable sensors. 

III. RS Solvers 

A. A Convex Relaxation Solver 

It is known that if the infinity norm satisfies := maxj < 1, then the £i-norm ||t||i is the 

convex envelope (the largest convex under-approximant) of ||t||o; see e.g., ||6j p. 119]. This property is 



used also in CS [29], and prompts one to relax the NP-hard problem < \Pq^ to 



min ||t||i (3) 

x,t 

s.t. ||b; - Ajx|| 2 < U, i = 1, . . . , k. 

Note though that x here does not have to be sparse. The problem in ([3]) is an SOCP and can be efficiently 
solved by several existing algorithms 0. Invoking the implicit constraint t > and the definition of the 
^i-norm ||t||i := Yli=i l*i|> trie problem © is equivalent to 




(Pi) 

i=l 

which is still an SOCP, albeit unconstrained. 



The cost in (Pi I is the sum of the ^-norms of the residual vectors associated with the linear subsystems, 



which is continuous, but not differentiable. In the optimization circles, ([PjJ) is known as the minimization 
of the sum of (Euclidean) norms problem |6j Sec. 6.4]. It emerges also when solving problems related 
to Steiner trees, optimal location, and image restoration model constraints; see e.g., |[20l Sec. 2.2], and 
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references therein. Algorithmically, ( |Pi[ ) is tackled either by generic SOCP solvers, or, by interior-point 
algorithms customized to its specific form ll20l . 



Having relaxed the RS problem ( |Pq| ) to its closest convex approximation < \P^ which is tractable, it is 



of interest to reflect on various links and interpretations that ( |Pi[ ) can afford, postponing its performance 
analysis to Section ITVl 

Remark 1 ( lPi| ) versus LS). Clearly, the LS problem in £T|) can be rewritten as 

min{||t|| 2 : ||b, - Ajx|| 2 < U, i = l,...,k} 

x,t 



which is again a convex approximation of (Pq), though, as mentioned earlier, not the closest one. 



Remark 2 (([PjJ> versus block-sparse signal reconstruction). To establish this connection, assume that 
null(A r ) is non-empty. Let r, := bj — A^x denote the residual error vectors, and r T := [rj ■ ■ ■ r£]. 
Upon defining matrix C such that its null space is spanned by range(A), i.e., CA = 0, and d := Cb, 



the problem (Tpy) can be rendered equivalent to 

k 

min > llrJU (4a) 

r z — » 

i=l 

s.t. Cr = d (4b) 



which emerges when reconstructing a block-sparse vector r satisfying the under-determined system in (I4bl) 
(271, l26l . ifTTI . |5l . To establish the equivalence, write (Pj) as min r J2i=i \\ r i\\2 subject to r = b — Ax. 
Premultiplying both sides of the last equality by C, one arrives at (0]). The same equality couples the 
minimizers of the two problems: if ro solves (0]) and A"!" is the pseudo-inverse of A, then A^(b — ro) 
solves ( |Pi[ ). The optimization in (01) relies on the prior information that r is block sparse. For the RS 
problem, the vector of interest x is not (block) sparse; but the residual error vector is block sparse. 



Remark 3 (( |Pi[ ) versus l\ -error regression). In the degenerate case m = 1, where every subsystem reduces 
to a single equation, ( |Pi| ) reduces to the i\ -error minimization problem (0, which is known to be robust 
to outliers |[22l Ch. 4], 0, (H. Under the conditions stated in Remark [2l the unconstrained l\ -error 
regression problem is equivalent to the constrained optimization (cf. (0])) 

min ||r||i (5) 

r 

s.t. Cr = d. 

The problem in (f5]) is widely known in the CS literature as basis pursuit (BP); for a thorough treatment 
on this pair of problems see also (H. 



March 29, 2011 



DRAFT 



IEEE TRANSACTIONS ON SIGNAL PROCESSING (REVISED) 



8 



B. A Concave Surrogate for RS 

Instead of substituting the cost ||t||o of ( |Pq[ ) by its closest convex approximation, namely ||t||i, letting 
the surrogate function be non-convex can yield tighter approximations. For example, the £o- norm of a 
vector x E W 1 was surrogated in Q by the logarithm of the geometric mean of its elements, or, by 
Yli=l \ x i\- ^ n ran ^ minimization problems, apart from the nuclear norm relaxation, minimizing the 
logarithm of the determinant of the unknown matrix has been proposed as an alternative surrogate; see 
|[T2l Sec. 5.2]. Building on this line of thought, consider surrogating ( |Po| ) by 



min 

x,t 


k 

J] log {U + S) 

i=l 




s.t. 


bj - AjX 2 <U, i = 1, . . 


.,k 



CP 2 ) 



where 5 is a sufficiently small but strictly positive constant preventing the cost from tending to — oo. The 
cost in is concave, but since it is smooth wrt t G iterative linearization may be utilized to obtain 
a local minimum lfT2l . fTl . Specifically, let (x^,t^) denote a tentative solution at the l-th iteration. Due 



to the concavity of the logarithm, the first-order approximation of log (U + 8) around t 



(t-l) 



log (U + S)< log I t\ 0) + S) + 



1 



4 0) + * v 



(0) 



+ 5 yields 
(6) 

Thinking along the majorization-minimization approach IfTSl . one can instead of minimizing the original 
cost on the left-hand side, minimize the majorizing cost on the right-hand side of ©, and iterate. 
Specifically, the minimization in ([7^]) can be iteratively driven to a local minimum lfT2l as 



! i 



AixlU < U 



+ 5 



l,...,k 



or equivalently, 



x^^ := argmin - — 
x i=i H bi 



lb,; - A,;x| 



(V) 



|x^ x ) II o becomes 



The iterative scheme can be terminated as soon as the relative error ||x^ — 1 ^||2/ ,, 
smaller than some e chosen equal to say 1CT 6 . The cost in © has the form of a weighted version of 

{l) - (lib, - AiX^^Ha + 5)^. When the residual 



where each of the error norms is weighted by w\ 
error of a subsystem is small, then the error of this system is weighted more during the minimization 
of the next iteration. A good initialization point for the iteration in ([7]) is the solution of $P\) that is 
equivalent to one iteration of $7} with all weights chosen equal. The simulated tests in Section |VT] will 



indicate that (fT]) can provide higher probability of identifying reliable sensors than 
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IV. Uniqueness and Identifiability 



Let s denote the minimum cost of §Fp) . Then, there exists at least one unknown xo € W 1 such that 
b,5 = A^xo for an unknown subset of sensors So with \Sq\ = s. The sensors in Sq will be referred to 
as reliable or consistent with respect to (w.r.t.) xo- Also, let j3 := s/k denote the number of consistent 
sensors over the total number of sensors; and 7 := n/(km) the ratio of the size of the unknown vector 
over the total number of measurements. 



Whether < \Pp$ has a unique minimizer, and hence an underlying xo can be uniquely recovered by 
( |Po| ), is considered next. The first thing to note at the outset is that when the consistent sensors w.r.t. 
xo are outnumbered by the unreliable ones, uniquely recovering xo is not guaranteed. This is because 
with s < k/2, there may exist an xi 7^ xq and an Si C X with \Si\ = |5q| = s and Si n Sq = such 



that h$ 1 = AsjXi; thus, xq and xi are both minimizers of §Pp\ . It is henceforth assumed that s > k/2 



or ft G (1/2, 1]. Under this assumption, uniqueness of the \Pp\ minimizer is further characterized in the 
following lemma. 



Lemma 1. Let vector xo be a minimizer of \Pq\ satisfying s > k/2 out of the k subsystems. This 
minimizer is unique if and only if 

rank(A,5 c ) = n (8) 
for every S c dX with cardinality \S C \ = 2s — k. 



Proof: Vector xo is not the unique minimizer of ( |Pq[ ) if and only if there exists at least one xi 7^ xo 
such that h$ 1 = A^xi for an Si C I with \Si\ = \Sq\ = s. Given that s > k/2, the two subsets cannot be 
disjoint; hence, they must have a non-empty intersection S c := 5oH5i with cardinality 2s — k < \S C \ < s. 
The subsystems belonging to S c are satisfied by both solutions; that is, bs c = As c xo = A,s e xi, which 
is equivalent to the existence of a nonzero z G R n such that A,s e z = or rank(A,s c ) < n. Multiple 



minimizers of \Pq) can thus be avoided if and only if rank(A5 c ) = n. Note that whenever rank(A,s c ) = n 
for every S c with |«S C | = 2s — k, it holds for every S c of larger cardinality as well. ■ 
Lemma Q] reveals two interesting points on uniquely recovering xo by ( |Pq| ). First, the reliable sensors 
should not only outnumber the unreliable ones, i.e., /3 > 1/2; condition © implies additionally that 
(2s — k)m > n, or j3 > (7 + 1) /2. Second, because /3 < 1, the inequality j3 > (7 + 1) /2 implies 7 < 1 
or km > n, requiring the total number of equations to be at least equal to the number of unknowns. 



Uniqueness of the < \Pp$ minimizer is also implied by the conditions stated in the next lemma. These 



conditions will be used in the next subsection. 
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Lemma 2. If for any nonzero v £ range(A) and any partition (S,S) of I with \S\ = s > k/2 it holds 
that 

J2hih>J2\Wi\\2 (9) 
where Vj jj f/ie z-f/z m x 1 Wocfc subvector of v, f/ien ([8]) is satisfied. 

Proof: Arguing by contradiction, suppose that (© holds, whereas ([8]) does not hold; or, in other 
words there exists an S c C I with \S C \ = 2s — k < s and rank(A,s c ) < n. Consequently, there exists 
a nonzero vector u £ M n such that A^u = 0. Next, partition X into three collectively exhaustive and 
mutually exclusive subsets S c , S\, and £2, with = | *S2 1 = k — s. Define also v := Au for which 
v,5 c = by the definition of u. 

Consider first (© with S = S c U S\ and S = S2, to deduce that 

Apply (O again for S = S C U S2 and S = Si , to arrive at 

which clearly contradicts the previous inequality and completes the proof. ■ 
Having introduced the convex relaxation of ( |Po| ), the next critical question is whether the solution 
of the former coincides with the solution of the latter. Even though the NP-hardness of ( jPp| ) forejudges 
that this cannot hold in general, the ensuing results show that for random Gaussian matrices A and under 
reasonable assumptions on the problem dimensions, equivalence of ( jPil ) and ( |Pq| ) occurs with probability 
exponentially decaying in n. The analysis starts by characterizing this equivalence using a set of necessary 
and sufficient conditions. 

A. Necessary and Sufficient Conditions 

The conditions under which the convex optimization problem \P\) yields the same solution as the 



NP-hard problem \Pq) are provided in the following theorem. Using the equivalence between \P\\ and 
d3]) under the conditions of Remark |2j this theorem is related to E71 Th. 2], which in turn, generalizes 
results from ifTOll to the block-sparse signal case. 



Theorem 1 (Range space conditions). Every xo minimizing $Pq) by satisfying s > k/2 out of the k 
subsystems is the unique minimizer of §P\\ if and only if 



X)||vi|| 2 >^||v i || 2 (10) 

i&S «65 
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for any nonzero v G range(A), and for any partition (S,S) of I with \S\ = s. 

Proof: See Appendix. ■ 
In words, Theorem Q] asserts that for every nonzero v G range(A), the sum of the s smallest ||vj||2 
components should be larger than the sum of the remaining (k — s) components. It is worth mentioning 
that the range space conditions are impossible to check in practice; but they are useful in establishing 



identifiability, as it will be the case for the probabilistic characterization of the ([PoMPi]) equivalence 
when A is random (cf. Subsection IIV-BI) . 

Another set of ([Po|)-(lPi"l> equivalence conditions can be derived from the block restricted isometry 
properties of matrix C as defined in Remark|2l see ifTTI . |51l . However, these conditions are only sufficient. 

Remark 4. Conditions ( fTOb do not depend on b, but only on the range space of A. Thus, whenever A 
satisfies (fTOb . any matrix A' := AG for any nonsingular G G M nxn satisfies (fTOb as well. 

Remark 5. Sufficiency of the conditions in (fTOb remains valid even if some additional constraints of the 



generic form x G C are present in the original problem < \Ppty . In certain applications for instance, the 
unknown x may be non-negative so that C = MJL; or, there may be a priori information of the form 
C = {x : ||x — x c ||2 < R}, dictating the unknown vector to lie in a ball of radius R around a known 



center x c G W 1 . Even though the extra constraints generally reduce the feasible sets of ( |Pq| > and ( |Pi[ ), the 
conditions remain sufficient. Hence, the probabilistic bound to be developed in Subsection IIV-BI remains 
valid even when extra constraints are imposed. 



B. Probability Bound 

As commented earlier, the conditions in Lemma [2] are practically infeasible to check for a given 
sensing matrix A. However, similar to CS (H, it will be possible to prove that the conditions in © 
hold with overwhelming probability (H, i.e., probability decaying exponentially in n when 7 and k are 
fixed, assuming A has i.i.d. Gaussian entries. The main result, summarized in Theorem |2l is based on 
the following lemma. 

Lemma 3 (Deviation Inequality 11191 ). Consider x ~ J\f(O p ,~l p ), and a Lipschitz continuous function 
f : W — > M. with Lipschitz constant L. Then for any t > 0, it holds that 

Pr(/(x) -E[/(x)] < -t) < exp ("^2) • (H) 

This deviation inequality is a special case of more general concentration results fl9l Sec. 1.1]. It 
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provides exponentially decreasing bounds on the tail distribution for any sufficiently smooth function 
/(x) of a multivariate Gaussian x, thus generalizing the Chernoff bound to nonlinear functions. 

Capitalizing on Lemma |3j the next theorem extends the results of |[27l Th. 4] and its refined version 
ll26l Th. 3]. Focusing on the Gaussian case and following a different line of proof, neat closed-form 
expressions will emerge not only for the values of (3 and 7, for which the probabilistic bound is valid, 
but also for the bound itself. The proof is based partly on the methodology of |[25l . where the minimum 
nuclear norm relaxation of the rank minimization problem is analyzed under linear constraints on the 
unknown matrix. In contrast, related probabilistic analysis in ifTTTl and is based on a generalization of 
the restricted isometry property of A that serves only as a sufficient condition for the exactness of the 
convex relaxation; see also flU. 

Theorem 2. Let vector xo be a minimizer of ( |Po| ) satisfying s > k/2 out of the k subsystems, and 
assume that the entries of A £ ^ kmxn are independently drawn from AT (0, 1). If 

V > 4±± (12) 

then whenever m > n^wf/g^y ^ e vector x ' s the unique minimizer of \P\\ with probability exceeding 
1 - e 



-ac (^)n+o n (n) > where ^(0^) := | ^M^l _ A md Q £ ^ ^ 



Proof: To lower bound the probability of success for the ( |Pi[ ) problem, it suffices to upper bound the 
probability that the conditions in dTOb fail, an event denoted by £. Let {£■,} be all the N := M subsets 
of T having cardinality s. Moreover, let £j denote the event of having the conditions in (TTQb failing for 
the partition (Sj , Sj ) 



3v G range(A) \ {0} such that ^ ||vi|| 2 < J] ||v<|| 2 \ (13) 



for j = 1,...,N. The probability of failure can be expressed as Pr {£) = Pr^U^J. The events 

>.r- 



{Sj} 1 ^! are not independent, but Pr(£) can be bounded as 



where inequality (a) comes from the union bound; (b) is due to the symmetry of the distribution of A 

which implies that all the Sj's are equiprobable; and (c) is the standard upper bound of the binomial 

fk\ I 'ke\ s 

coefficient f j — ( — j • Based on (fT4l . the goal now is to upper bound the probability Pt(£j). For 
notational simplicity, the partition corresponding to £j will be denoted by (<S,«S) instead of (Sj,Sj). 
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Given that v £ range(A) \ {0}, there exists a nonzero u £ M n such that Vj = AjU for i = 1, . . . , k. 
To render the inequality in ([T3l scale-invariant, one can study only the cases for which 1 1 u 1 1 2 = 1; hence, 

Pr (Sj) = Pr j 3u with ||u|| 2 = 1 such that ^ || Aiu|| 2 - ^ || A;u|| 2 < J (15) 



= Pr ( /(A) < ) (16) 

where 



/ (A) := ipf j £ W A ^h ~ E W^ u h I 
" U " 2 [i£S i & S ) 



(17) 



The equality from (IT5T > to (fT6l) comes from the fact that if there exists a unit £2 -norm u satisfying 
the inequality in (fl"5l ). then the minimizer of / (A) should also satisfy this property. The function /(A) 
possesses convenient properties which facilitate the application of Lemma[3] Specifically, it is shown in the 
Appendix that: /(A) is Lipschitz continuous with constant L < yk (cf. Lemma|4|; and the expected value 
of the function is lower bounded (cf. Lemma |5]>, that is E [/(A)] > fj, = (j^f ~ l) y/kn(l + o n (l)). 
Hence, for every t > 0, Lemma [3] implies that 

Pr (/(A) < - t] < Pr (7(A) < E [/(A)] - ^ < .e"' 2 /^ (18) 



Upon focusing on fj, and ignoring the o n (l) term, whenever (3 > (»/7 + l)/2 so that /i > 0, and setting 
i = /i in (TT~8T >. yields the bound 

Pr(£,) < Pr (7(A) < 0^ < e - c o(/3,7)«+o„H (19) 

where C0 (/3, 7 ) := (^-lf /2. 

Substituting the bound (O into (O, it follows that 

Pr(£) < exp (7 (/?, 7 ) - n + o n (n) 

<exp(-ci(/3,7)n + o„(n)). (20) 

For every /3 > (^7 + 1) /2, choose ci(/3,7) = ac (/3,7), and define c 2 (/3,7) := ((1 - a)c (/3, 7))" 1 
for any a £ (0, 1). Then, whenever m > c 2 (/3,7)/31og(e//3)/7, the bound in (l20l is nontrivial. ■ 

Remark 6. As a sanity test, the condition (3 > (^7 + l)/2 posed by Theorem |2] coincides with that in 
ll26l Th. 3] after the appropriate mapping of dimensions. However, in Theorem |2j both the values of m 
over which the bound holds, as well as the bound itself are explicitly defined. 

Remark 1. As expected, the condition (3 > (^7+ 1)/2 is clearly stronger than the condition /3 > (7+l)/2 
implied by the uniqueness of the ( [Pq| ) solution in Lemma [TJ 
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V. Robustness to Noise 

In a more realistic sensing scenario, the acquired measurements are corrupted by additive noise. If 5o 
denotes the unknown subset of reliable sensors, the pertinent model is 

= As x + n So (21) 

where ns stands for zero-mean noise assumed independent across sensors. Vector n,s models ambient 
noise, finite precision, analog-to-digital conversion, and quantization effects, communication noise, or 
even, the inadequacy of linear regression to fully capture the measured data b^,, . 

In this noisy case, the unknown xo does not exactly satisfy the linear subsystems in Sq. In an attempt to 
exploit the link between and © when noise is present, one may be tempted to apply the group-Lasso 
regularization, which was originally proposed for recovering block-sparse vectors in a linear regression 
setup [32 j. However, this approach is not applicable because r is not block sparse when noise is present. In 
fact, solvers of the noise-free setups (TPj]) and (T^T) are useful for analyzing uniqueness and identifiability 



issues. In addition, ([Py) and ([f^) solvers are practically suitable for high-SNR sensing applications. This 
motivates the ensuing framework which is suitable for RS in the presence of noise. Without additional 
prior information on the model describing the unreliable sensors, the noisy counterpart of the RS problem 
can be stated as follows. 

Problem Statement 2 (Robust sensing in noise (RSN)). Given {bj, Aj} ie2 - where bj £ M m and Aj £ 

l mx ", for which an unknown subset Sq C I of known cardinality s follows the model in (1211 ). estimate 
the unknown xo by minimizing the least-squares error over any S <ZX with \S\ = s. 

The aforementioned problem statement lends itself naturally to the following optimization problem 

min min ||bs — A^xlln. (22) 
x \S\=s 

The function of x defined by the inner minimization is the pointwise minimum over finitely many convex 
functions, and as such, it is non-convex. Solving (122)) incurs combinatorial complexity since one has to 
solve all the (^) LS problems before solving the outer minimization. 
An optimization problem related to that in (l22l) is the following 

min 

X 

i=l 



£ Mb* " Aix) (23a) 

i=l 

s.t. h(Ti) := < 



^ INIi ' INb - A , A>0. (23b) 
-A 2 , llr^ > A 



2 
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Functional /i(r,j) amounts to the LS cost for residuals smaller than the threshold A, and ignores sensors 
attaining larger residuals. In the scalar case (cf. m = 1), problem (|23T ) has been considered in (6j 
Sec. 6.1.2]. Problems (1221) and d23l ) are related as follows: suppose that for a specific A the solution of 
(123T ) is x* for which there are s* residuals satisfying the upper branch of (I23bb . Then it can be readily 
shown that x* is a solution of (1221 for s = s*. Unfortunately though, /i(r,) is non-convex as well. The 
problem in (|23T ) can be surrogated by replacing /i(r-j) by its closest convex approximation, which is 
pursued in the next subsection by establishing a neat link between the RSN problem at hand and robust 
estimation methods CF7J Ch. 7], (22j Ch. 4]. 

A. RSN and Robust Linear Multivariate Regression 

Building on Remark [3] of Subsection MI- A I the unreliable sensors can be viewed as giving rise to 
outlier-corrupted equations in a linear regression setting. Robust linear regression has been extensively 
studied over the past decades ifTTl . lf22l . 

When m = 1, the RSN problem can be solved by Huber's M-estimator 

k 

x = arg min p(bi — aj x) (24a) 

i=l 

{-r 2 \r\ < t 

2 2 (24b) 

r\r\ — , |r| > t 

where p{r) is the Huber function for r > 0. The problem in d24l) is convex, and can be cast as an 
SOCP |[T5l . lf2TTl . (6l p. 190]. Regarding the cutoff parameter t, when the outliers' distribution is known 
a priori, its value is available in closed form so that Huber's M-estimator is asymptotically optimal; see 
IfTTl Sec. 4.5]. Alternatively, assuming that the noise is standard Gaussian, r is usually set to r = 1.34 
such that the estimator in (l24l is 95% asymptotically efficient at the normal distribution lf22l p. 26]. To 
render Huber's M-estimator invariant to any noise variance a 2 , one has to multiply r by a in d24b| ). If 
a is unknown, a robust estimate of it is commonly used instead [22, Sec. 4.4]. 

The case m > 1, which is of interest here falls under the realm of robust multivariate linear regression 
(21, HI. The novel approach to tackle it will be to postulate a model accommodating inconsistent sensors, 
approximate the meaningful cost of ( |23T ) by a convex one, and solve it using an efficient globally 
convergent algorithm. 

Consider modeling the unreliable sensors using the auxiliary outlier vectors {uj G M m }f =1 . Vector 
Uj = if the i-th sensor is reliable; and Uj ^ deterministic ally, otherwise. Model (|2TT > can now be 
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extended to incorporate the unreliable sensors as 

bj = AjX + Uj + rij, i = 1, . . . , k. (25) 

Since some Uj's are zero, the aggregate outlier vector u T := [uf • • • u£] is block sparse. Hence, using 
the aggregate model b = Ax + u + n, the novel RSN solver amounts to 

CPs) 




where A > is an appropriately chosen tuning parameter. Among the two optimization variables of ( P3 1, 



only the outlier vector u is block sparse. For m = 1, $P^ reduces to the cost proposed in |fT51 and 



shown to be equivalent to (124b . Even when the initial matrix of interest A is tall, ( ji-y always entails the 
fat matrix [A l km ] £ ^mx(n+km)_ The second part 

is a regularization term, reminiscent of the group 
Lasso penalty function 11321 , which is known to promote block sparsity in the u vector. The latter will 
be explicitly accounted for in the forthcoming analysis. 



B. Solving Q 

To better understand §P^ and develop an efficient solver, it is prudent to explore the form of its 
minimizer(s). Let [(x*) T (u*) T ] T denote a minimizer of ( ji-y , and define the associated residual vector 



r* := b — Ax*. Given x*, the vectors {u*}^ =1 in ^P^ can be found separately as the minimizers of 



min 0(uj) (26) 

s.t. (p(ui) := -||r* - ||| + A||uj|| 2 , i = l,...,k. 

Although <ft(ui) is not everywhere differentiable, its subdifferential d<p(ui) can be defined 0. For Uj 7^ 0, 
where </>(uj) is differentiable, the subdifferential is simply Uj (1 + A / 1 1 u.^ 1 1 2 ) — r *- Otherwise, by definition 
and after using OBI , d(p(ui) can be shown to be the set {Xgi — r*} V ||gi||2 < 1- Compactly, 

d^n t ):={ l \ W 1 lT (27) 

{ {\gi-r* : ||gi|| 2 < 1} , ^ = 0. 

Vector u* is a minimizer of d26b if and only if € dcj)(u*). Based on d27l ), two cases are considered. 

First, if u* / 0, the condition G d(f)(u*) yields 

u*(l + A/||u*|| 2 ) = r* (28) 

which means that u* is a positively scaled version of r*. Considering the ^-norm in both sides of (|28T ). 
it follows that ||u*||2 = ||r*||2 — A. Plugging ||u*||2 back into (f28T) . yields u* = r* ^1 — Since 
Il u * lb > 0, this holds if and only if ||r*|| 2 > A. 
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Second, for the minimizer to be u* = 0, there should be a g* for which ||g*||2 < 1 and Ag* = r*, or 
equivalently, ||r*||2 < A. The latter proves that ( IP3I ) indeed admits a block-sparse minimizer u*. 

Substituting u* into d26i yields 0(u£) = ||r£||l|/2, when ||r*|| 2 < A; and <p(u*) = A||r*|| 2 - A 2 /2, 
otherwise. Having minimized §P^ over the Ui's, the minimizer x* can now be found as 

k 

min p v (bi - A^x) (29a) 



=1 



till ' Il r «ll2 — ^ 



s.t. ^fa) := < z " " (29b) 

X 1 1 ri 1 1 2 - t , IN| 2 > A 

where p v {vi) is a vector-generalized Huber function. It is now evident that \P^) is equivalent to (|29l . which 
rather surprisingly turns out to be a generalization of Huber's M-estimator (l24l ) to the vector case. The 
sensors capable of achieving a lower 1 1 1 1 2 value, and are more likely to be reliable, appear in d29l under 
the conventional LS criterion. But the sensors having 1 1 x-^ 1 1 2 > A, contribute ( A 1 1 1 1 2 — A 2 /2) < ||rj|||/2 
to the cost, and are deemed "less important" in specifying x. For the latter set of sensors, u* 7^ holds 



too. Thus, \P%\ not only estimates the unknown vector x, but also reveals the sensors most likely to be 
unreliable in the presence of noise. 

Regarding the cutoff parameter A in \P%) and d29b| ), it is worth noting that when A — > + , the costs of 



( 1291 ) and §P^ tend to the cost of ( |Pi| ). Consequently, for A — > + the data of all sensors are declared to 
contain outliers; and according to the previous analysis, (bj — A,x*) — > u* / for all i. This suggests 



that the solution of ( |Pi| ) does not provide zero residuals anymore. On the other hand, as A — > 00, the 
same costs reduce to the LS criterion, and all sensors are classified as reliable, or u* = for all i. 

A heuristic rule of thumb for practically selecting A is setting it to Tyfm, where r is the equivalent 
parameter for the scalar case and has been selected according to the techniques mentioned after (l24l ). 
If the number of reliable sensors is roughly known (e.g., based on prior operation of the network), an 



alternative approach is solving < \P^ for a grid of A values and selecting the one identifying the prescribed 
number of outliers. Note that solving < \P^ for several values of A can be efficiently performed either 
through the group-LARS algorithm ll32l . or, by using the block coordinate descent algorithm of the next 
subsection with what is called "warm startup" lfl4ll . The latter initializes the tentative solutions of < \P^ for 
a grid value of A with the solution derived for the previous grid value of A. The computational efficiency 
of such an approach has been numerically verified for the Lasso problem lfl4l . ll28l . 

Remark 8. In Problem Statement |2j the noise term was assumed to be independent across sensors. 
Specifications such as the geographical distribution of sensors may impose correlation across different 
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sensor readings. In this case, if the covariance matrix Y, of the aggregate noise vector n T := [rrf ■ ■ • njp 
is known, a standard preprocessing step is to prewhiten the data as b' := £ _1 / 2 b and A' := X1~ 1 / 2 A. 
Prewhitening "spreads" the influence of unreliable sensors across the entries of b'. As a result, the LS 
and l\ -error regression estimators and even the robust Huber M-estimator are not applicable; see also 
|fT51 for similar observations in the scalar case (m = 1). On the contrary, given that u remains block 



sparse, the ( P3 1 estimator can successfully handle a colored noise setup by simply modifying its cost to 



||b' - A'x - 5T 1/2 u||!/2 + A Eti H|| 2 . 
C. A Block Coordinate Descent Algorithm 



As mentioned earlier, ( ji-y is convex. It can be cast as an SOCP and solved by standard, interior point- 
based solvers. An alternative solver of ( IP3I ) exploiting the problem structure and offering computational 
advantages is block coordinate descent, which has been successfully applied to related optimization 
problems |[T3l , (3T|. The core idea behind this solver is to partition the optimization variable into blocks, 
and minimize iteratively the cost w.r.t. one block variable while keeping the rest fixed. 

To apply block coordinate descent to the RSN problem at hand, consider minimizing the cost separately 
w.r.t. x and u. Each iteration involves two steps: In the first step, the objective is minimized w.r.t. x, 
while keeping u fixed, whereas in the second step the roles are interchanged. Specifically, let x^ -1 ) and 
u ('-i) denote the tentative solutions at the (I — l)-th iteration. During the first step of the Z-th iteration, 
fix u = u^" 1 ), and find x^ as the minimizer of the resultant quadratic; that is, 

x W = (A T A)- 1 A T (b-u('- 1 )). (30) 

In the second step, fix x=x^ and find the uf''s as the minimizers of the per-sensor optimization problems 

min -||rf^ - Ui||| + A||ui|| 2 (31) 

where := bj — AjX^ for i = 1, . . . , k. As per d26l ). the solutions of (f3TT > are provided neatly in 
closed forrr 2 



as 

(0 



, ||r«|| 2 <A 

&r) , Wrh^X. (32) 



llr 

The solution in (l32l ) does not require x' 1 ', but only r^. Combining (l30l) and (l3Tb . it follows that 

r W = p^b + P A u( l -V (33) 

2 This is not the case for the colored noise scenario discussed in Remark [8] where the vectors {u;} can then be jointly found 
by any group Lasso algorithm instead 1321 . 
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where P A := A(A r A) _1 A T and := I - P A 

Summarizing, the iterations entail: (a) updating the residuals based on (l33l) : and (b) applying the thresh- 
olding rule in (|32~1 ). As matrix and vector P^b can be computed offline, the most computationally 
demanding operation is the matrix-vector product in step (a). Since km > n, this product would better 
be implemented as (A(AA T )~ 1 ) (A T u) m 0(kmn) operations. The developed algorithm has overall 
complexity O(kmn) per iteration. The presence of zero blocks in u can be further exploited to save 
computations. Numerical simulations demonstrate that the overall complexity of this block-coordinate 
approach is much lower than the complexity of the interior point-based algorithms. 

Due to the specific form of §P^ , convergence of the block coordinate descent iteration follows readily 
from the results of 11301 . The algorithm can be initialized at u^ ^ = 0, so that x« is the conventional 
LS solution. It is terminated when the relative error ||u^ — u^ -1 ) H2/H11W H2 becomes smaller than a 
predefined threshold, e.g., e = 10 -6 . Upon termination, the output is the solution vector u, which reveals 
the sensors affected by outliers, whereas the solution x can be obtained directly from (l30l ). 

D. A Non-Convex Surrogate for RSN 

In the context of robust linear regression, Huber's M-estimator is just one choice from the class of robust 
estimators defined as the minimizers of (1241 for appropriately chosen p functions. It has been argued 
that estimators corresponding to non-convex p functions, such as the bisquare (Tukey's), Hampel's, or 
Andrew's estimators, yield improved robustness-efficiency trade-offs in practice ll22l p. 99]. Similarly in 
the multivariate case, convex M-estimators [4!] are practically replaced by non-convex M- or S-estimators 
appropriately initialized 0. 

Alternatively, it is of interest to explore a non-convex surrogate of ( ji-y paralleling that of Subsection 
IIII-B I Recall that the RSN solver in seeks x and u based on fewer observations than unknowns, but 



taking advantage of u's block sparsity. To further promote block sparsity in u, the ||iij||2 terms in 
can be replaced by log ( 1 1 u,/ 1 1 2 + S) for a small positive 5, to end up with the non-convex problem 

(A) 




Following the majorization-minimization rationale presented in Subsection IIII-B I ( Pj) can be driven to 
a stationary point |[T8l using the iterations 



k 

X (0 )U ('A : =argmin-||b- Ax- + X^w^ ||uj|| 2 , (34) 



x,u 2 

i=i 
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The optimization per iteration of (1341 ) is a weighted version of (j^J), and thus can be efficiently solved 



using the steps (l3Qb and (l32l after replacing A in (l32l with Xp := Aioj^ for all i at the Z-th iteration. 
The iterations can be initialized with the ( ji-y solution which corresponds to setting all weights to unity. 



The simulations of Section |VT] will demonstrate that the $P^ solver outperforms that of in terms 
of the mean-square error (MSE) even after a single iteration. Note that as with d34l , single-iteration 
methods based on non-convex surrogates of the (group) Lasso cost function have been proposed with 
well documented properties 041 . ll23l . 

VI. Simulated Tests 

A. Checking the Weak Bound 

Among the results of Section [IV] the one that can be numerically validated is the weak bound of 
(fT9l ). This bound is termed weak because it refers to the occurrence of a single event £ s , namely, to a 
single partition (<S,«S) with S = s. According to this bound, if f3 and 7 are kept fixed and as long as 
P > (\/7 + l)/2, the probability Pr(£ s ) is arbitrarily small for large n. 

To validate this result, the entries of A are drawn independently from M(0, 1), and the unknown vector 
is modeled as xo ~ n~ 1 / 2 A r (0, I n ). Given that Pr(£ s ) is invariant to the permutations of the subsystems, 
the partition (So, Sq) with Sq = {1, . . . , s} is simply selected. The output of the consistent subsystems is 
b,5 = A,5 xo; whereas for the inconsistent ones b<j o = w is simulated with w ~ A/"(0, I(fc_ a ) m ). Notice 
that due to the selected normalization, the observation vectors have equal variance, i.e., E[||b»i HI] = m 
for all i G X. For several (n, m) pairs, ten values of 7 are selected uniformly over the interval (0.1, 1] 
that correspond to ten values of k. And for every j(k), the number of consistent subsystems s is chosen 
such that f3(s,k) = s/k G [0.5,1]. For each pair (j(k), (3(s,k)), the probability of identifying 
uniquely the < \Pp$ solution is empirically evaluated through 100 Monte Carlo runs. For each experiment, 
the solution of is deemed successful whenever x satisfies ||x — xo||oo < 10" 4 . 

The results are depicted in Fig. [2] Every pair ('j(k), /3(s,k)) corresponds to a circle whose face 
intensity indicates the probability of recovery as explained in the caption. The east and south-east parts 



of Figs. |2(a)p(c)| are not as crowded, since for 7 close to 1, the integer k becomes small, which implies 
that there are not many choices for an integer s € [k/2,k]. The condition for highly probable recovery 
in the weak sense, (3 = l) j% is also shown as a black solid curve. According to the weak bound 

(fT9l ), the circles above this curve correspond to dimension setups with high probability of success for 
large n. The empirically evaluated probabilities validate the result even for moderate values of n. 
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B. Test Cases for RS 

The RS solvers developed are numerically compared in this subsection. The setup involves a network 
of k = 16 sensors collecting observation vectors of size m = 4, and an unknown vector of size n = 20. 
Quantities xo, A, and b, all follow the model of the previous experiment, and the number of consistent 
sensors ranges from 8 to 16. 

The comparison includes: (i) the LS solution of £T|); (ii) the i\ -error regression solution of (0; (iii) 
the \P\) solver; and (iv) the \P$ solver obtained after one iteration of (0. In addition, a genie-aided LS 
(GA-LS) solver knowing a priori the reliable sensors, ^.qa-ls '■= (A^ A^J -1 A^ bs , is implemented 
to serve as a benchmark. The parameter 5 in (0 is set to 10~ 4 , whereas the simulation results were 
insensitive to the range of values from 10 -2 to 10 -8 . 

The sensor detection probability is empirically estimated through 1,000 Monte Carlo experiments. An 
estimate x is considered to have successfully classified the sensors whenever the residual ||bj — AjxHoo 
is smaller than or equal to 10~ 4 for i £ So, and larger than 10~ 4 for i S 5q. As evidenced by Table 



1(a) the LS solution fails to identify the reliable subset. In contrast, the novel <\Pi} scheme shows a clear 



advantage over the l\ -error regression solution, while the empirical detection probability further improves 



for the \P$ method, even after a single iteration. 



C. Test Cases for RSN 

To evaluate the developed RSN solvers, the unknown vector was fixed at xo = \ n j^/n, while the 
reliable sensors followed the model b,s = A,s xo + n,s , with n,s ~ A/"(0, a 2 I sm ) and known a. A 
plausible figure of merit in this scenario is the MSE, E[||xo — x||r,], which was empirically estimated by 
averaging over 1,000 Monte Carlo experiments. 

Comparisons included: (i) the LS estimator; (ii) the GA-LS estimator; (iii) the £i-error estimator of 
©; (iv) the conventional (scalar) Huber's M-estimator of \P$\ (v) the \P\\ solver; (vi) the one-iteration 



solution of \P$ \ (vii) the \P^\ solver; and (viii) the one-iteration solution of \P$ . The value of 8 
parameters in ( jj-y and §P^ turned out to be not critical, and were set to 10~ 4 . The cutoff parameter r 



for the Huber's M-estimator was selected as 1.34<t, whereas A in both ( l-Pjj ) and ([Fy was set to 1.34cry / m. 
It is worth noting that the average number of iterations for the block-coordinate descent algorithm of 
Subsection was between 16 (for SNR= 10 dB) and 30 (for SNR= 25 dB), while its execution time 
was 1,000 times lower than that of a standard SOCP solver. 



In Fig. 3(a) the MSE achieved by each method is plotted versus the number of consistent sensors 



s for SNR = 10 dB. The curves show that the block-sparsity ignorant LS, l\, and Huber's estimators 
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are generally outperformed by the novel schemes. The ( |Pi[ ) and ( ji-^ solvers, originally designed for the 



RS task, still exhibit reasonable performance that worsens as s — > k. The fli-y estimator shows a slight 



improvement; but its solution serves as a good initialization point for the one-iteration estimates of (|JJ 
Note that the derived RSN solvers combine robustness with efficiency in the absence of outliers. 

To test the effect of correlated sensor measurements, the following experiment was performed. The 
reliable sensors were modeled again as b,s = As xo + ns , the unreliable ones as = n w + 
where n w ~ A/"(0, 1(fc_ s ) m ), while [n^ ] T ~ jV(0, S) and 5] is a symmetric Toeplitz matrix with 
first column [1 0.9 0.9 2 • • • 0.9 fcm_1 ] T . The two RSN solvers were modified according to Remark [8] 
Fig. |3(b)| shows the MSE curves obtained at SNR = 10 dB. In this correlated noise setup, the superiority 
of RSN solvers is even more prominent. 

Correctly classifying the sensors as reliable/unreliable is critical. Once a method has completed this 
classification task, the estimation of xo can be performed based solely on the sensors classified as reliable. 
Assuming successful classification, the MSE performance of GA-LS can be attained. The probability of 
correct sensor classification was evaluated in another simulation setup that differs from the previous ones 
in the following ways: problem dimensions were (n, m, k) = (80, 8, 32); the reliable sensors followed the 
linear white Gaussian model at SNR = 5 dB; had entries independently drawn from the zero mean 
Laplacian distribution with variance (<r 2 +l); and r and A parameters were set to a and cjy/m, respectively. 
The solvers (i)-(iii) and (iv)-(v) do not provide a classification mechanism, hence, a sensor was deemed 
reliable when its residual ^2-norm was smaller than 10 -4 . The Huber's estimator (iv) can identify outlying 
scalar measurements and a sensor was considered correctly classified when all its measurements were 
correctly classified. For (P3) and (-P4), the identification followed naturally from the Uj vectors. The 
results are listed in Table |I(b)| The majority of methods fail to identify the reliable sensors and yield 
an empirical probability close to (1 — s/k), which is the ratio of unreliable sensors. The improvement 
offered by Huber's estimator is marginal, while (P3) and in particular (P4) outperform all others. 

VII. Conclusions 

Contemporary approaches to compressive sampling and variable selection in linear regression problems 
exploit (block) sparsity present in the signal of interest. The fresh perspective offered in this work broadens 
the scope of sparsity-exploiting algorithms to settings where model mismatch induced by unreliable 
sensors or outliers gives rise to (block) sparse residuals, even when the signal of interest is not sparse. 
This perspective links compressive sampling and sparse linear regression with two important problems: 
(i) finding the maximum number of feasible subsystems of linear equations; and (ii) robust multivariate 
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linear regression. Capitalizing on these links, robust sensing algorithms were developed to reveal unreliable 
sensors and recover the signal of interest based on reliable sensors. In the absence of noise, necessary and 
sufficient conditions were provided for exact recovery (identifiability). Their probabilistic characterization 
showed that they hold with overwhelming probability when the regression matrix is Gaussian distributed. 
In the presence of noise, the RS task was reformulated to a combinatorial problem that was subsequently 
surrogated by (non-)convex costs. The two subsystem-aware robust estimators derived can be solved 
by an efficient block coordinate descent algorithm. The simulated tests demonstrated that all proposed 
schemes succeed in the task for which they have been designed for. 

Appendix 

Proof of Theorem [7} The sufficiency of the conditions in (fTOb is shown first. Recall from Lemma [2] 



that the conditions in ( fTOb imply that xq is the unique minimizer of (|Pq|). Let 5 denote the set of reliable 



wrt xq sensors with |<S| = s > k/2 for which = A^xq. Vector xq is the unique minimizer of ([p 



too if and only if the vector (xo — u) for any nonzero u £ W n yields a strictly larger ( |Pi| ) cost than xo 
does. Indeed, letting v := Au, the cost attained by (xo — u) is 

k 

^ ||b» - Ajx + Vj|| 2 = ^2 H b * ~ AiX o + v ih + ^2 H b » ~~ A * x ° + Vi H 2 
i=l «e5 ieS 

- ^2 ii Vi " 2 + ^2 " bi ~ AiX ° + Vi " 2 

ies its 

> 2^ ll Vi ll 2 + 2^ H b * ~~ A * x oll2 - \\ v ih > 2^ " bi ~~ A * X 0||2 

ieS ie5 ieS »=i 

where equality (a) uses that = A^xo, inequality (6) stems from the reverse triangle inequality, and 
inequality (c) is due to the assumed conditions of the theorem, and again the fact that b,s = A^xo- 

Necessity is shown by proving the contrapositive. Specifically, it must be shown that if there exists 
a v G range(A) and an (5,5) partition of X with |5| = s for which Sies Il v «ll2 ^ Sie5 H V *H 2 ' 



then there exists an xq that attains a minimum (Pq i cost of s, but is not the unique minimizer of (Pi 



Suppose that bs := A^xo and b§ := A^xo/2 for an (5,5) partition with |5| = s > k/2. Vector 
xq obviously minimizes ( |Pq| ), whereas xq/2 does not since |5| < |5|. Assume v := Axq £ range(A) 



an d Sie>s Il v «ll2 < Sie5 Il v «ll2- ^ lS eas y to check that the ( |Pi| ) costs attained by xq/2 and xq are 



respectively X^ie5 1 1 v « lb/2 and Ylies II v « lb/2- Hence, it has been shown that xq/2 attains a ( |Pi| ) cost 



not greater than that of Xq, i.e., xq is not the unique minimizer of ((Py). This concludes the proof. 
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Lemma 4 (Lipschitz continuity of /(A)). The function /(A) defined in (fTTT ) is Lipschitz continuous with 
Lipschitz constant at most \fk. 

Proof: Let A, A' G l fcmx ' 1 and w, w' G M n be the minimizing arguments of /(A) and /(A'), 
respectively. The difference of the function at these two points is 

/(A) - /(ao = I e ii ahi 2 - E 11 A Hi2 ) - ( E 11 A ^ w/ ii2 - E n A * w 'i 

\ie<S i S 5 / \ie-S ies 

< (Eii a ' w, iI2-Eh a ' w, iI2] - (eh^'^-eii^'i 

<Ell(A«-A / t )w / || 2 + Ell(A 4 -AOw / || 2 < sup EH A * u ll2 

i&S ie s l|u||2=1 i=i 

where inequality (a) holds because w is by definition the minimizer of /(A); (6) follows from the 
reverse triangle inequality applied on each subset; (c) holds trivially for ||w'||2 = 1; and Aj := Aj — A'^. 
Now, define the function appearing in the right-hand side of the last inequality as 

k 

g(A) := sup V H A Hl2 (35) 

so that /(A) - f(A') < g(A). Since /(A') - /(A) < 5 (-A) = g(A), it holds that |/(A) - /(A')| < 
g(A). Given that g(0) = 0, if g(A) is Lipschitz continuous with constant at most L, i.e., |g(A)| < 
L||A||jr, where ||A||i? is the Frobenius norm of matrix A, then /(A) is also Lipschitz continuous and 
its constant is at most L. Hence, it suffices to show that g(A) is Lipschitz continuous and its constant 
is upper bounded by Vk. 

To proceed, recall first that the ^2-norm of a vector x G R n can be written as (6l p. 637] 

||x|| 2 = sup{x T y : ||y|| 2 < 1}. (36) 

Using (|36l ), g(A) can alternatively be expressed as 

k 

g(A) = sup sup V v f A * u ( 37 ) 

||u|| 2 =l||v I || 2 <l i=1 

which is a supremum over infinitely many linear functions of A, and as such it is convex. Recall that 
if a function / : W — > R is convex with a subgradient s(x) for which || sup x s(x)||2 is finite, then / is 
Lipschitz with constant L < \\ sup x s(x)||2- This claim can be proved by the definitions of the subgradient 
and the Lipschitz constant. Thus, it suffices to find a subgradient of g(A) and upper bound its norm. 
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If u* and {v* i}* =1 are the maximizers of <?(A), then a subgradient is given by the matrix G(A) 



with 1 1 1 1 2 = 1 and ||v* i||2 < 1 for i = 1, . . . , k. The norm of this subgradient is 



\ i=i 



\\G(A)\\ F 

The bound is independent of A, and the proof is complete 



\ i=l 



\ X] ll v *.»lli - vh- 
\ i=i 



Lemma 5 (Expected value lower bound). For ?/ze random matrix A € M' rax " w/f/i entries drawn 

'2/3-1 



independently from J\f (0, 1), if holds that E [/(A)] > ji with /i := y — lj \/^n (1 + o n (l)). 
Proof: Consider rewriting /(A) using (l36l ) as 

f (A) = inf sup inf vjAjii + zjAjU. 

ii u ii2=i|| Vl ii 2 <iii^i!2<it^ 



(38) 



«65 ie,s 

Next, introduce auxiliary random vectors y G M n , Sj G M m , tj G W 71 for i = 1, . . . , k, and w G 
having their entries drawn independently from A/"(0, 1), and define the functional 



h y (u, Vj, Zj) := vf AjU + ^ zf AjU + v^ic 
h x (u, Vj, Zj) := ^ vf Si + ^ zf tj + \//cu T y, for 



L r,n^ rilr».ll_ < 1^^ 



|u|| 2 = l, {|K|| 2 < 1}* =1 and {Hzilla < l}f =1 . 



(39a) 
(3%) 
(40) 



Consider now the triplets (u, Vj, Zj) and (u', v-, z^). By using the i.i.d. property of the random variables 
appearing in the functionals, it holds that 

E [h y (u, Vj, Zj) h y (u, V-, z-)] = u T u J] vj v- + ^ zf z- + fc 



E (u, v u zi)h x (u , Vj,z-)] = ^vfvj + ^zfz'j + ku T u 



whereas the difference of the two expectations is 



E[/ij / (u,Vj,Zj)/ij / (u / ,v^,z / j)] -E [h x (u, Vj, Zj) h x (u', v^, z^)] = (u T u'-l) ^ vf v- + J] zf z'j - A; 



By exploiting the properties of vectors u, Vj, and z, in (1401 . it follows readily that 

E [h x (u, Vj,Zj) h x (u, Vj,z-)] = E [h y (u, Vj,Zj) h y (u, v-,z-)] , 
E [h x (u, Vj,Zj) h x (u', v.,z-)] < E [h y (u, Vj,Zj) h y (u',v-,Zj)] . 



(41a) 
(41b) 
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To proceed, the following lemma is needed fT6l Cor. 10]. 



Lemma 6 ( 111610 . Let {Xijk} and {Yijk} be two zero-mean Gaussian processes indexed by (i,j,k) for 
i = 1, . . . , n, j = 1, . . . , m, and k = 1, . . . ,p, which satisfy the following conditions: 



(cl) E 



v2 
^ijk 



E 



Y 2 

ijk 



for all (i,j,k). 

(c2) For any two triplets a = (i,j,k) and a' = (i',j',k'), E[X a X a >] > E{Y a Y a >] if i = i' and 
j ^ j', and E [X Q J a /] < E [1^,1^'] in all other cases. 
Under (cl) and (c2), it holds that 



E 



max min max 



> E 



max min max K- 



(42) 



Even though the indexes (i,j,k) are denumerable, by using the compactness argument of |[25l Pr. 1], 
the comparison in (l42l extends to minimizations/maximizations over compact sets as well. Mapping the 
Xijk (Yijk) variables of Lemma[6]to — h x (u, Vj, Zj) (— h y (u, Vj, z,)), it can be verified that the conditions 
of the lemma are met (cf. (|4TT)). and upon using (1421) deduce that 



E 



sup inf sup — h x (u, Vj, Zj 

|u|| 2 = l ll Z i||2<l ||v i || 2 <l 



> E 



sup inf sup —h y (u,Vi,Zi 

|| U || 2 = 1 IW|2<1 || Vi || 2 <l 



Given that sup x —f(x) = — ini x f(x), the previous inequality is equivalent to 



E 



inf inf sup /i x (u,Vj,Zj) 

M|a=l ||zi|| 2 <l || Vi || 2 <l 



< E 



inf inf sup h y (u, Vj,Zj 

l|u||2 = l ||Zi||2<l ||V»|| 2 <1 



But since the random variable w in d39ab is zero mean, the right-hand side of the last inequality is equal 
to the desired expected value, E [/ (A)]. Thus, it has been established that 



E [/ (A)] > E 



inf inf sup h x (u, Vj,Zj 

II 2 =1 ||Zi|| 2 <l |j Vs || 2 <l 



Using the definition of h x (u, Vj,Zj) and exploiting the separability of the optimization, as well as the 
properties in (l40l) . one arrives at 



E [/ (A)] > sE [\\si\\ 2 ] - (k - s)E [||ti|| 2 ] - v^E [||y|| 2 ] 



(43) 



Recall that if x ~ M(0 n , I n ), then 1 1 x| 1 2 is chi-distributed with n degrees of freedom, and mean value 

„ 

(44) 



x 2 



B (f'l) 

where B (•, •) denotes the Beta function. Applying (l44l) three times in (l43l yields 



E [/ (A)] > p = (2s - k) 



Vk- 



2tt 



B (f'l) B (f'^) 
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Using the standard approximation ^ = y/n(l + o n (l)) (U Formulas 6.1.46 and 6.2.2], and for 

IT' 2 j 

fixed 7 = n/{km) and fc, it also holds that = y/n/y/jk(l + o n (l)). Thus, the bound ft can be 

compactly expressed as ft = — lj V^fcn(l + o n (l)), which concludes the proof. ■ 
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Fig. 1. A wireless sensor network linked with a fusion center. (Un)reliable sensors are color coded as (red) green. 



TABLE I 

Empirical probability of successful sensor classification (%). 



(a) RS task with (n, m, k) = (20, 4, 16). (b) RSN task with (n, m, k) = (80, 8, 32). 
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(c) n = 20, m = 10. (d) n = 20, m = 2. 

Fig. 2. Empirical probability of success for $Pi\ and the weak bound of l |19l l (solid black line). Empty circles correspond to 
quadruplets (n, m, k, a) with perfect empirical recovery and solid black circles to problem setups having failed in all experiments. 
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(a) White noise. 
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(b) Colored noise. 
Fig. 3. MSE performance for RSN with (n, m, k) = (20, 4, 16). 
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